Check stride on preallocated output for matmul (fixes #15286) #15288

timholy · 2016-02-29T12:10:44Z

CC @andreasnoack, @machiningcentre

timholy · 2016-02-29T12:11:53Z

Easy-click link: #15286.

tkelman · 2016-03-01T10:35:08Z

LGTM. Andreas?

(aside: BLIS https://github.com/flame/blis would allow arbitrary strides here)

Check stride on preallocated output for matmul (fixes #15286)

andreasnoack · 2016-03-01T15:09:13Z

@tkelman We allow arbitrary strides so I'm wondering how much speedup BLIS can get on matrices with special strides.

tkelman · 2016-03-01T15:36:39Z

Yeah, it depends on how well optimized the non unit stride case is in BLIS relative to the julia generic gemm. Probably not commonly benchmarked but worth trying.

timholy · 2016-03-01T15:36:48Z

When strides get so big that there's only 1 element per cache line, I suspect the best performance might be achieved by copying (which essentially compacts the data).

tkelman · 2016-03-01T15:39:44Z

I believe BLIS does copy for the non unit stride case in order to still use optimized simd operations, but only one panel at a time rather than the entire array. Should reread their papers and code though. On typical dgemm their haswell kernels are quite competitive with openblas and mkl.

timholy · 2016-03-01T15:44:07Z

Interesting. I'd be surprised if we couldn't someday match them in pure julia with @threads and vectorization, but we aren't there yet.

tkelman · 2016-03-01T15:52:59Z

Yeah, the basic code generation patterns they do would all translate naturally into Julia style code generation (nicer, actually, since they're leaning heavily on the c preprocessor), and we could use their kernels as inline llvm or asm. Some day.

(cherry picked from commit 34d1d30) ref #15288

Check stride on preallocated output for matmul (fixes #15286)

34d1d30

timholy added the backport pending 0.4 label Feb 29, 2016

andreasnoack added a commit that referenced this pull request Mar 1, 2016

Merge pull request #15288 from JuliaLang/teh/matmul_subarray

c3b372a

Check stride on preallocated output for matmul (fixes #15286)

andreasnoack merged commit c3b372a into master Mar 1, 2016

andreasnoack deleted the teh/matmul_subarray branch March 1, 2016 15:07

tkelman pushed a commit that referenced this pull request Mar 7, 2016

Check stride on preallocated output for matmul (fixes #15286)

c747915

(cherry picked from commit 34d1d30) ref #15288

tkelman removed the backport pending 0.4 label Mar 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check stride on preallocated output for matmul (fixes #15286) #15288

Check stride on preallocated output for matmul (fixes #15286) #15288

timholy commented Feb 29, 2016

timholy commented Feb 29, 2016

tkelman commented Mar 1, 2016

andreasnoack commented Mar 1, 2016

tkelman commented Mar 1, 2016

timholy commented Mar 1, 2016

tkelman commented Mar 1, 2016

timholy commented Mar 1, 2016

tkelman commented Mar 1, 2016

Check stride on preallocated output for matmul (fixes #15286) #15288

Check stride on preallocated output for matmul (fixes #15286) #15288

Conversation

timholy commented Feb 29, 2016

timholy commented Feb 29, 2016

tkelman commented Mar 1, 2016

andreasnoack commented Mar 1, 2016

tkelman commented Mar 1, 2016

timholy commented Mar 1, 2016

tkelman commented Mar 1, 2016

timholy commented Mar 1, 2016

tkelman commented Mar 1, 2016